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Summary 

The  history  of  the  use  of  the  Cholesky  decomposition  in  the  estimation 
and  prediction  of  autoregressive  moving  average  time  series  is  reviewed  and 
a  modification  of  Ansley’s  (1979)  estimation  method  is  suggested.  This 
modification  is  motivated  by  the  results  of  Newton  and  Pagano  (1981)  in 
prediction  theory  and  results  in  a  significant  reduction  in  the  calculations 
required  to  evaluate  the  exact  likelihood.  It  is  also  shown  how  the  modified 
method  can  be  used  to  obtain  predictors  and  prediction  variances. 
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1.  Introduction 


Let  (e(t),  t  =  0,  +  1,  ...}  be  a  white  noise  time  series  of  zero 
mean  uncorrelated  random  variables  having  variance  a2.  Then  the  time 
series  (Y(t),  t  -  0,  +  1,  . ..}  satisfying 

p  q 

£  ot(j )  Y (t-j  )  =  l  0(k)  e  (t-k) ,  a  (0)  =  0(0)  =  1  , 
j*0  k=0 

is  called  an  autoregressive  moving  average  process  of  order  p,  q  (ARMA 
(p,q)) .  If  q  =  0,  Y(*)  is  called  an  autoregressive  process  of  order  p, 

AR (p) ,  while  if  p  =  0  then  Y(*)  is  a  moving  average  process  of  order  q, 
MA(q).  Modern  time  series  analysis  has  found  the  ARMA  process  extremely 
useful  in  modeling  and  predicting  future  values  of  a  time  series.  However, 
the  computational  tasks  involved  in  obtaining  efficient  estimators  of  the 
parameters  ot ( • ) ,  0(*)»  and  a2  and  in  obtaining  predictors  of  future 
values  of  Y( • )  are  extremely  difficult  and  have  led  to  the  development  of 
a  wide  variety  of  methods  for  accomplishing  these  tasks.  These  methods 
approach  the  problem  from  many  points  of  view  including  1)  modified 
Newton  Raphson  likelihood  maximization  (Hannan  (1969),  Parzen  (1971), 

Akaike  (1973),  for  example),  2)  nonlinear  least  squares  (Box  and  Jenkins 
(1970),  for  example),  3)  Kalman  filtering  techniques  (Akaike  (1974), 

Harvey  and  Phillips  (1979),  Jones  (1980),  Pearlman  (1980),  for  example), 

4)  the  use  of  the  Cholesky  matrix  triangularization  technique  (Whittle 
(1963),  Pagano  and  Parzen  (1973),  Pagano  (1976),  Pagano  (1973),  Phadke  and 
Kedem  (1978),  Ansley  (1979),  Newton  (1980),  and  Newton  and  Pagano  (1981), 
for  example),  and  others. 


The  purpose  of  this  paper  is  to  review  the  history  of  the  Cholesky 
algorithm  as  applied  to  ARMA  models  and  to  show  how  one  can  easily  modify 
the  estimation  procedure  of  Ansley  (1979),  which  is  currently  regarded 
as  the  most  computationally  efficient  exact  likelihood  estimation  technique 
for  at  least  the  case  where  p  >  q  or  q  <  5  (see  Pearlman  (1980)) ,  to  obtain 
a  significant  reduction  in  the  number  of  computations  required. 

Section  2  describes  the  ARMA  process  estimation  and  prediction  problems 
and  reviews  the  use  of  the  Cholesky  algorithm,  while  the  proposed  modification 
is  given  in  section  3. 

2.  The  Estimation  and  Prediction  Problems 

T 

Given  a  realization  YT  =  (Y(l),  Y(T))  from  an  ARMA(p,q)  process, 

one  often  desires  estimators  of  the  a(-),  B(*)>  and  a2  as  well  as  memory~t, 

horizon-h,  minimum  mean  square  error  linear  predictors  Y(t  +  h|t)  and  prediction 

variances  (J2  =*  E(Y(t+h)  -  Y(t+h|t))2  of  Y(t+h)  given  Y  =  (Y(l)  , . . .  ,Y (t)  )T 

c ,  n  -t 

for  a  variety  of  values  h^,  hg  and  t^,  of  t  and  h. 

The  usual  procedure  is  to  obtain  a  Gaussian  likelihood  identification 

/V  /V  A 

<*(•)>  8(*),  and  a2  as  the  values  of  the  parameters  maximizing  the  Gaussian 
likelihood  function 

L( a,  6,a2|YT)  =  (2iO'T/2|ryjTr1exp(-  Y^r"^  Y^) 

where  I*  =  Cov(Yt)  =  (E(Y(j)Y(k)))  =  ^<1  j-k| )) ,  j ,  k  =  1,  . . . ,  t.  The 

Gaussian  assumption  is  often  used  also  so  that  Y(t+h|t)  and  a2  ,  ,  which 

in  general  are  given  by  Y(t+h|t)  =  ^  h  Yt  and  a2  h  =  1^(0)  “  rt  h  rY^trt  h 
where  rt  h  *t  y,  “  rt  h  ■  (R^Ct+h-l) , . . .  .R^h)  )T  ,  can  be  obtained 

as  the  conditional  mean  and  variance  of  Y(t+h)  given  Y  . 


3 


With  the  advent  of  efficient  numerical  algorithms  for  maximizing 
nonlinear  functions  that  require  a  user  to  only  supply  a  routine  that 
evaluates  the  function  for  specified  values  of  its  arguments  (see  Dennis 
and  More  (1977),  for  example),  the  emphasis  on  finding  the  identification 
of  a( • ) *  0(.) ,  and  o2  has  been  on  finding  efficient  algorithms  for 
evaluating  L,  or  equivalently  a  more  easily  evaluated  function  having 
maxima  at  the  same  values  of  its  parameters.  The  major  efforts  in  this 
regard  have  been  to  apply  the  Kalman  filter  algorithm  and  the  Cholesky 
decomposition  algorithm.  Each  of  these  methods  were  originally  used  in 
ARMA  modeling  as  a  means  of  obtaining  predictors  and  prediction  variances. 

2.1  The  Cholesky  Decomposition 

Let  A  be  a  symmetric  (nxn)  matrix.  Then  A  is  positive  definite  if 
n  n 

T 

and  only  if  it  can  be  decomposed  as  the  product  A^  =  (Cholesky 

T 

decomposition)  or  A  =  LA  D  L.  (modified  Cholesky  decomposition)  where 
n  A,n  a, n  a ^ n 

Ma  ^  is  an  (nxn)  lower  triangular  matrix  with  positive  diagonal  elements 

and  L.  is  a  unit  lower  triangular  matrix  while  D4  is  a  diagonal  matrix 
A,n  A,n 

containing  the  squares  of  the  diagonal  elements  of  M  .  Further  these 

A ,  n 

decompositions  are  unique  and  nested,  i.e.,  for  example,  L  is  the  lower 

Ay  I\ 

triangular  matrix  in  the  decomposition  of  the  KxK  principal  minor  of  A^  , 

K  =  1,  ...»  n.  Thus  the  (j,k)  element  of  L  „  can  be  denoted  L.  .  .  for 

A,  Jv  A,  j  ,  K 

j,  k  <  K. 

2.2  The  Cholesky  Decomposition  and  Prediction 

In  his  seminal  work  on  time  series  prediction.  Whittle  (1963)  suggested 

that  such  a  triangularlzatlon  would  be  useful  in  finding  Y(t+h|t)  and  o2  .. 

t,h 

Pagano  and  Parzen  (1973)  and  Pagano  (1976)  showed  that  for  an  MA(q)  process 
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VCt+hit)  -jjh  4>t+hit+h_k  h-0. 


L  o 


,  h>q 


q 


h-l 

7  j2  J) 

k=Q  i , t+h, t+h-k  Y,t+h-k,t+h-k 


Vk.K-J 


=  0 


»  j>q 


lim  W  K  K  i  =  B(j)  ,  j  =  1,  ....  q, 
K-x»  J 


lim 

K-*» 


Y,K,K 


min(j-l,q) 

where  e(l)  =  Y(l)  and  e(j)  «  Y(j)  -  E  .  .  ,  e(j-k),  j>2  is  the 

k=i 

jth  element  of  Y^  for  any  K  j.  Thus  one  need  only  calculate  the 

successive  rows  of  and  the  corresponding  e(*)  until  convergence  of  its 
q  +  1  nonzero  elements  to  the  3(*)  is  achieved,  at  which  point  one  replaces 
1*Y  ^  and  K  K  by  8(j)  an^  respectively  in  the  above  formulas. 

Newton  (1980)  discusses  the  extension  of  this  algorighm  to  the  multivariate 
MA(q)  process. 

Pagano  and  Parzen  (1973)  further  suggested  that  this  procedure  could  be 
extended  to  the  ARMA  (p,q)  case  by  applying  the  autoregressive  filter  a(-) 
to  Y(p+1),  Y(T)  and  using  the  MA(q)  algorithm  combined  with  an  AR(p) 

prediction  algorithm. 

Finally  Newton  and  Pagano  (1981)  show  that  for  a  general  purely  nondeter- 


mlnistic  covariance  stationary  time  series  Y(.) 
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t+h-1 

Y<t+h|t) '  k'h  Vt+h,t+h-k  e(t+h-» 


h-1 

2  2 

^t  ,h  =  kfQ  S.t+h.t+h-k  °Y,t+h-k,t+h-k 


lim  4  v  K-i  =  B»(J)  »  J  -  0 

K*»  Y’K’K  J 


11m  Dv  if  K  =  0 

K-x»  Y’K’K 


where  the  0  (*)  and  a2  are  the  coefficients  and  the  noise  variance  of  the 

OO  X  00 

MAC®)  representation  of  Y(*)  and  e(l)  =  Y(l),  while 


j-1 

e(j)  =  Y(j )  -  Z  L  e(k)  ,  j  >  2  . 
k=l  ,J’ 


Applying  this  general  theory  to  the  ARMA  (p,q)  process  and  further 

factoring  into  w^ere  Z(*)  is  an  AR(p)  process  with  coefficients 

“*1  — T  T 

«(•)  and  noise  variance  o2  and  rx  K  =  LZ  K  FY  RLZ  K  =  Lx  KDX  Rl<x  K  has  last 
K-q  rows  and  columns  being  the  covariance  matrix  of  an  MA(q)  process,  Newton 
and  Pagano  obtain 

P 

Y(t+h|t)  =  X(t+h|t)  -  E  <x(j )Y (t+h-j | t)  , 

j=l 


h-1 

2 

°t,h  kfQ  DX,t+h-k,t+h-k  < 


t+h 


^t+h-k  Lz.t+h,tLx,-e,t+h-k 

w  - 

Y  (t+h-j  1 1)  -  Y  (t+h-j )  if  j  >  h  , 


X(t+h|t) 


q 

I  L, 

k-h 


X, t+h, t+h-k 


e (t+h-k),  h  *  1,  ...»  q 


h  >  q 
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:(j)  is  the  jth  element  of  1^  ^  X^,  K  _>  j,  and  Xj^  *  L”1^  Yr  .  Further, 


lim  *  j  1 »  •  •  •  >  ^  » 

K-*»  *,K,K-j 


lim  D  „  ,  -  a2  . 
K-*°  X,K,k 


A  number  of  other  results  are  given  by  Newton  and  Pagano  which  greatly 
reduce  the  amount  of  storage  and  computations  necessary  for  obtaining 
and  *  We  note  that  this  method  is  different  than  that  given  by 
Pagano  and  Parzen  (1973)  in  that  instead  of  leaving  the  initial  values 
Y(l),  . Y(p)  untransformed,  the  Newton  and  Pagano  algorithm  forms  the 
time  series  X(*)  by  Yk  *  *‘e‘ 

X(j)  -  Y ( j )  ,  j  =  1 

j-i 

I  a  .(k)Y(j-k)  ,  j  *  2,  ....  p 

k=0  J 

P 

E  a(k)  Y(j-k)  ,  j  >  p 
k=0 


where  the 
recursion 


(k)  are  easily  obtained  by  performing  the  Levinson  (1974) 
for  j  =  p-1,  . ..,  1  ,  with  <Xp+^(k)  =  a(k): 


(1)  =  aw(i)  ~  yiti+nyiW-P 


1  ~  af+i*J+1) 


i  -  i. 


j  <  P 


2.3  The  Cholesky  Decomposition  and  Parameter  Estimation 


Pagano  (1973)  discusses  the  use  of  the  Cholesky  decomposition  in 
solving  the  Yule  Walker  equations  for  an  AR(p)  process.  Phadke  and  Kedem 
(1978)  and  Ansley  (1979)  apply  the  Cholesky  decomposition  to  the  problem 
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of  evaluating  the  likelihood  function  of  the  MA(q)  and  ARMA(p,q)  processes 
respectively,  essentially  using  the  same  transformation  and  decomposition 
that  Pagano  and  Parzen  (1973)  used  for  obtaining  predictors.  We  briefly 
describe  Ansley's  method  and  show  in  the  next  section  how  it  can  be  improved 
by  incorporating  the  theory  of  section  2.2. 

Define  a  new  time  series,  with  m  =  max(p,q+l), 

W  (t)  =  Y (t)  ,  t  =  1,  . . . ,  m 

P 

T.  a(j)Y(t-j)  ,  t  =  m+1,  ...,  T  . 
j“0 

Thus  in  matrix  notation  this  is  ^Y^  where  My  T  is  a  (TXT)  unit 

T 

lower  triangular  matrix  and  ^  -  Cov(W^)  =  My  TTy  TMy  which  is  a 

banded  matrix  of  bandwidth  q,  i.e. ,  there  are  q  nonzero  subdiagonals,  from 

T 

row  nrl-1  on.  Thus  finding  the  Cholesky  decomposition  ^  =  My  ^My  ^  and 
the  new  random  vector  e^  =  can  be  done  recursively  only  having  to 

compute  q  elements  in  each  row  of  Then  e^  =  M^^My  ^Y^  and  thus 

eT-N-NT(0,  M^T  =  V  ' 

Thus  maximizing  L  is  equivalent  to  maximizing 

T 

P (a,  B,  o2|e  )  =  (2it)_T^2|M^  |-1exp(-  £  . 

-  -  -i  w’i  t=l  2a2 

3.  Using  the  Prediction  Results  in  Ansley’s  Method 

Thus  it  is  clear  that  Ansley’s  method  consists  of  factoring  the  lower 
triangular  matrix  in  the  Cholesky  decomposition  of  Ty  ^  into  the  product  of 
the  two  easily  calculated  lower  triangular  matrices  My  ^  and  My  But  from 
section  2.2  it  is  easily  seen  that  if  this  factorization  is  done  as 
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*r 


^  ^  one  can  call  upon  the  results  of  Newton  and  Pagano  to  halt 

calculation  of  successive  rows  of  Lv  once  their  elements  have  converged  to 

A,  l 

B(l),  . 3(q).  The  experience  of  the  present  authors  is  that  the  speed  of 

q  k 

this  convergence,  which  is  a  function  of  how  close  the  zeros  of  1'  £(k )z 

k-0 

are  to  the  unit  circle,  is  such  that  a  significant  reduction  in  computations 
is  obtained,  particularly  if  T  is  large  relative  to  q.  Further  one  can  then 
use  the  prediction  theory  of  section  2.2  to  obtain  desired  predictors  and 
prediction  variances. 

We  note  that  section  2.2  can  also  be  used  to  prove  that  the  elements 

of  the  rows  of  Ansley's  ^  also  converge  to  the  B(0  and  our  experience 

is  that  the  rate  of  convergence  is  virtually  identical  with  the  above  method. 

Thus  either  factorization  could  be  used  but  it  seems  more  natural  to 

factor  b  into  an  exact  AR(p)  covariance  matrix  factor  L  and  an  exact 
Y  ,  I  L  ,  1 

MA(q)  (except  for  the  first  max(p,q-KL)  rows  and  columns)  covariance  matrix 
factor  ^  and  use  the  methods  of  section  2.2  to  find  the  likelihood,  the 
predictors,  and  the  prediction  variances. 


REFERENCES 


Akaike,  H.  (1973).  Maximum  likelihood  identification  of  Gaussian  auto¬ 
regressive  moving  average  models,  Biometrika,  60,  255-265. 

Akaike,  H.  (1974).  Markovian  representation  of  stochastic  processes  and 
its  application  to  the  analysis  of  autoregressive  moving  average 
processes.  Annals  of  the  Institute  of  Statistical  Mathematics,  26, 

363-387. 

Ansley,  C.  (1979).  An  algorithm  for  the  exact  likelihood  of  a  mixed 
autoregressive-moving  average  process.  Biometrika,  66,  59-65. 

Box,  G.  E.  P.  and  Jenkins,  G.  M.  (1970).  Time  Series  Analysis  forecasting 
and  control.  Holden-Day,  San  Francisco. 

Dennis,  J.  E.  and  More,  J.  J.  (1977).  Quasi-Newton  methods,  motivation 
and  theory.  SIAM  Rev. ,  19,  46-89. 

Hannan,  E.  J.  (1969).  The  estimation  of  mixed  moving  average  autoregressive 
systems.  Biometrika,  56,  579-593. 

Harvey,  A.  C.  and  Phillips,  G.  D.  (1979).  Maximum  likelihood  estimation  of 
regression  models  with  autoregressive  -  moving  average  disturbances. 
Biometrika,  66,  49-58. 

Jones,  R.  H.  (1980).  Maximum  likelihood  fitting  of  ARMA  models  to  time 
series  with  missing  observations.  Technometrics,  22,  389-395. 

Levinson,  N.  (1974).  The  Wiener  RMS  error  criterion  in  filter  design  and 
prediction.  Journal  of  Mathematical  Physics,  15 ,  261-278. 

Newton,  H.  J.  (1980).  Efficient  estimation  of  multivariate  moving  average 
autocovariances.  Biometrika,  67 ,  227-231. 

Newton,  H.  J.  and  Pagano,  M.  (1981).  The  finite  memory  prediction  of 
covariance  stationary  time  series.  Submitted  for  publication. 

Pagano,  M.  (1973).  An  algorithm  for  fitting  autoregressive  schemes. 

Applied  Statistics,  21,  274-281. 

Pagano,  M.  (1976).  On  the  linear  convergence  of  a  covariance  factorization 
algorithm.  Journ.  Assoc.  Comp.  Mach.,  23,  310-316. 

Pagano,  M.  and  Parzen,  E.  (1973).  Timesboard;  A  time  series  package.  Proc. 
of  Comp.  Sci.  and  Stat. :  7th  Ann.  Symp.  on  the  Interface,  ed.  by 
W.  J.  Kennedy,  Statistical  Laboratory,  Iowa  State  Univ.,  Ames,  Iowa. 

Parzen,  E.  (1971).  Efficient  estimation  of  stationary  time  series  mixed  schemes. 
Proc.  39th  Session  of  I.S.I.,  Washington,  D.C.. 


Pearlman,  J.  G.  (1980).  An  algorithm  for  the  exact  likelihood  of  high- 
order  autoregressive  -  moving  average  process.  Biomet rika,  67, 
232-233. 

Phadke,  M.  S.  and  Kedem,  G.  (1978).  Computation  of  the  exact  likelihood 
function  of  multivariate  moving  average  models.  Biometrika,  65, 
511-519. 

Whittle,  P.  (1963).  Prediction  and  Regulation  by  linear  least  squares 
methods.  English  Universities  Press,  London. 


